38 research outputs found
Non-parametric Bayesian modelling of digital gene expression data
Next-generation sequencing technologies provide a revolutionary tool for
generating gene expression data. Starting with a fixed RNA sample, they
construct a library of millions of differentially abundant short sequence tags
or "reads", which constitute a fundamentally discrete measure of the level of
gene expression. A common limitation in experiments using these technologies is
the low number or even absence of biological replicates, which complicates the
statistical analysis of digital gene expression data. Analysis of this type of
data has often been based on modified tests originally devised for analysing
microarrays; both these and even de novo methods for the analysis of RNA-seq
data are plagued by the common problem of low replication. We propose a novel,
non-parametric Bayesian approach for the analysis of digital gene expression
data. We begin with a hierarchical model for modelling over-dispersed count
data and a blocked Gibbs sampling algorithm for inferring the posterior
distribution of model parameters conditional on these counts. The algorithm
compensates for the problem of low numbers of biological replicates by
clustering together genes with tag counts that are likely sampled from a common
distribution and using this augmented sample for estimating the parameters of
this distribution. The number of clusters is not decided a priori, but it is
inferred along with the remaining model parameters. We demonstrate the ability
of this approach to model biological data with high fidelity by applying the
algorithm on a public dataset obtained from cancerous and non-cancerous neural
tissues
Structural and non-coding variants increase the diagnostic yield of clinical whole genome sequencing for rare diseases
BACKGROUND: Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25-30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome.METHODS: We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants.RESULTS: Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving.CONCLUSIONS: Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing.</p
Whole-genome sequencing of chronic lymphocytic leukemia identifies subgroups with distinct biological and clinical features
The value of genome-wide over targeted driver analyses for predicting clinical outcomes of cancer patients is debated. Here, we report the whole-genome sequencing of 485 chronic lymphocytic leukemia patients enrolled in clinical trials as part of the United Kingdom's 100,000 Genomes Project. We identify an extended catalog of recurrent coding and noncoding genetic mutations that represents a source for future studies and provide the most complete high-resolution map of structural variants, copy number changes and global genome features including telomere length, mutational signatures and genomic complexity. We demonstrate the relationship of these features with clinical outcome and show that integration of 186 distinct recurrent genomic alterations defines five genomic subgroups that associate with response to therapy, refining conventional outcome prediction. While requiring independent validation, our findings highlight the potential of whole-genome sequencing to inform future risk stratification in chronic lymphocytic leukemia
Consistency management with repair actions
Comprehensive consistency management requires a strong mechanism for repair once inconsistencies have been detected. In this paper we present a repair framework for inconsistent distributed documents. The core piece of the framework is a new method for generating interactive repairs from full first order logic formulae that constrain these documents. We present a full implementation of the components in our repair framework, as well as their application to the UML and related heterogeneous documents such as EJB deployment descriptors. We describe how our approach can be used as an infrastructure for building higher-level, domain specific frameworks and provide an overview of related work in the database and software development environment community
Simultaneous estimation of hidden model states (including intracellular calcium concentrations) and maximal conductances in a two-compartment model of a vertebrate motoneuron (II).
<p>Inference of maximal conductances and noise parameters during fixed-lag smoothing. (<b>A</b>) The standard deviations of the observation (Ai) and the intrinsic (Aii) noise at the soma and the dendrite. (<b>B</b>) Inferred maximal conductances of the sodium and potassium currents at the soma (Bi), of the N-type calcium current and the calcium-activated potassium current at the soma (Bii), of the calcium-activated potassium current at the dendrite (Biii) and of the N-type and L-type calcium currents at the dendrite (Biv). In all cases, parameter expectations gradually converged towards the true parameter values (dashed lines) after less than . The grey lines in Aii, Biii and Biv correspond to estimated parameters, when current was injected in the soma only. In these simulations, , , and the prior interval for was .</p
True and estimated values and prior intervals used during smoothing for all parameters in the two-compartment conductance-based model.
1<p>These parameter values were estimated when we used the broad prior intervals (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002401#pcbi-1002401-g011" target="_blank">Fig. 11Ai</a>).</p>2<p>Values in bold indicate the narrow prior intervals we used for generating <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002401#pcbi-1002401-g011" target="_blank">Figs. 11Aii, 11B, 11C</a> (and Supplementary <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002401#pcbi.1002401.s004" target="_blank">Figs. S4</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002401#pcbi.1002401.s005" target="_blank">S5</a>).</p